AITopics | Leisure & Entertainment

Collaborating Authors

Leisure & Entertainment

Learning to Discuss Strategically: A Case Study on One Night Ultimate Werewolf

Neural Information Processing SystemsMay-31-2025, 06:48:27 GMT

Communication is a fundamental aspect of human society, facilitating the exchange of information and beliefs among people. Despite the advancements in large language models (LLMs), recent agents built with these often neglect the control over discussion tactics, which are essential in communication scenarios and games. As a variant of the famous communication game Werewolf, One Night Ultimate Werewolf (ONUW) requires players to develop strategic discussion policies due to the potential role changes that increase the uncertainty and complexity of the game. In this work, we first present the existence of the Perfect Bayesian Equilibria (PBEs) in two scenarios of the ONUW game: one with discussion and one without. The results showcase that the discussion greatly changes players' utilities by affecting their beliefs, emphasizing the significance of discussion tactics. Based on the insights obtained from the analyses, we propose an RL-instructed language agent framework, where a discussion policy trained by reinforcement learning (RL) is employed to determine appropriate discussion tactics to adopt. Our experimental results on several ONUW game settings demonstrate the effectiveness and generalizability of our proposed framework.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: North America > United States (0.14)

Genre: Research Report > Experimental Study (0.93)

Industry: Leisure & Entertainment > Games > Computer Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Randomized Exploration for Reinforcement Learning with Multinomial Logistic Function Approximation

Neural Information Processing SystemsMay-31-2025, 05:48:04 GMT

Reinforcement learning (RL) is a sequential decision-making problem in which an agent tries to maximize its expected cumulative reward by interacting with an unknown environment over time.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

Asia > South Korea (0.14)
Africa > Ethiopia (0.13)

Genre: Research Report > Experimental Study (1.00)

Industry: Leisure & Entertainment (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.41)

Add feedback

Continual Audio-Visual Sound Separation Yiyang Nan 2 Shijian Deng 1 Shentong Mo

Neural Information Processing SystemsMay-31-2025, 04:26:36 GMT

In this paper, we introduce a novel continual audio-visual sound separation task, aiming to continuously separate sound sources for new classes while preserving performance on previously learned classes, with the aid of visual guidance. This problem is crucial for practical visually guided auditory perception as it can significantly enhance the adaptability and robustness of audio-visual sound separation models, making them more applicable for real-world scenarios where encountering new sound sources is commonplace. The task is inherently challenging as our models must not only effectively utilize information from both modalities in current tasks but also preserve their cross-modal association in old tasks to mitigate catastrophic forgetting during audio-visual continual learning. To address these challenges, we propose a novel approach named ContAV-Sep (Continual Audio-Visual Sound Separation). ContAV-Sep presents a novel Cross-modal Similarity Distillation Constraint (CrossSDC) to uphold the cross-modal semantic similarity through incremental tasks and retain previously acquired knowledge of semantic similarity in old models, mitigating the risk of catastrophic forgetting. The CrossSDC can seamlessly integrate into the training process of different audio-visual sound separation frameworks. Experiments demonstrate that ContAV-Sep can effectively mitigate catastrophic forgetting and achieve significantly better performance compared to other continual learning baselines for audio-visual sound separation.

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Israel (0.14)
Oceania > Australia (0.14)
North America > United States (0.14)
Europe > Germany (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry:

Leisure & Entertainment (0.93)
Media > Music (0.46)
Health & Medicine > Consumer Health (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.55)

Add feedback

Deep Gamblers: Learning to Abstain with Portfolio Theory

Ziyin Liu, Zhikang Wang, Paul Pu Liang, Russ R. Salakhutdinov, Louis-Philippe Morency, Masahito Ueda

Neural Information Processing SystemsMay-31-2025, 02:47:52 GMT

We deal with the selective classification problem (supervised-learning problem with a rejection option), where we want to achieve the best performance at a certain level of coverage of the data. We transform the original m-class classification problem to (m + 1)-class where the (m + 1)-th class represents the model abstaining from making a prediction due to disconfidence. Inspired by portfolio theory, we propose a loss function for the selective classification problem based on the doubling rate of gambling. Minimizing this loss function corresponds naturally to maximizing the return of a horse race, where a player aims to balance between betting on an outcome (making a prediction) when confident and reserving one's winnings (abstaining) when not confident. This loss function allows us to train neural networks and characterize the disconfidence of prediction in an end-to-end fashion. In comparison with previous methods, our method requires almost no modification to the model inference algorithm or model architecture. Experiments show that our method can identify uncertainty in data points, and achieves strong results on SVHN and CIFAR10 at various coverages of the data.

artificial intelligence, classification problem, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States > New York (0.14)

Industry:

Education (0.48)
Health & Medicine (0.46)
Leisure & Entertainment > Gambling (0.37)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Language as an Abstraction for Hierarchical Deep Reinforcement Learning

YiDing Jiang, Shixiang (Shane) Gu, Kevin P. Murphy, Chelsea Finn

Neural Information Processing SystemsMay-31-2025, 02:30:04 GMT

Solving complex, temporally-extended tasks is a long-standing problem in reinforcement learning (RL). We hypothesize that one critical element of solving such problems is the notion of compositionality. With the ability to learn concepts and sub-skills that can be composed to solve longer tasks, i.e. hierarchical RL, we can acquire temporally-extended behaviors. However, acquiring effective yet general abstractions for hierarchical RL is remarkably challenging. In this paper, we propose to use language as the abstraction, as it provides unique compositional structure, enabling fast learning and combinatorial generalization, while retaining tremendous flexibility, making it suitable for a variety of problems. Our approach learns an instruction-following low-level policy and a high-level policy that can reuse abstractions across tasks, in essence, permitting agents to reason using structured language. To study compositional task learning, we introduce an open-source object interaction environment built using the MuJoCo physics engine and the CLEVR engine. We find that, using our approach, agents can learn to solve to diverse, temporally-extended tasks such as object sorting and multi-object rearrangement, including from raw pixel observations. Our analysis reveals that the compositional nature of language is critical for learning diverse sub-skills and systematically generalizing to new sub-skills in comparison to non-compositional abstractions that use the same supervision.

machine learning, natural language, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (0.68)

Industry: Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

BricksRL: A Platform for Democratizing Robotics and Reinforcement Learning Research and Education with LEGO

Neural Information Processing SystemsMay-31-2025, 02:09:44 GMT

We present BricksRL, a platform designed to democratize access to robotics for reinforcement learning research and education.

artificial intelligence, machine learning, reinforcement learning, (20 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (0.93)

Industry:

Leisure & Entertainment (0.67)
Education > Educational Setting > Online (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Director3D: Real-world Camera Trajectory and 3D Scene Generation from Text Xinyang Li

Neural Information Processing SystemsMay-31-2025, 01:52:05 GMT

Recent advancements in 3D generation have leveraged synthetic datasets with ground truth 3D assets and predefined camera trajectories. However, the potential of adopting real-world datasets, which can produce significantly more realistic 3D scenes, remains largely unexplored. In this work, we delve into the key challenge of the complex and scene-specific camera trajectories found in real-world captures. We introduce Director3D, a robust open-world text-to-3D generation framework, designed to generate both real-world 3D scenes and adaptive camera trajectories. To achieve this, (1) we first utilize a Trajectory Diffusion Transformer, acting as the Cinematographer, to model the distribution of camera trajectories based on textual descriptions.

artificial intelligence, machine learning, natural language, (13 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Chūbu (0.14)
Asia > China > Fujian Province (0.14)

Genre: Research Report > Experimental Study (0.93)

Industry:

Transportation (0.46)
Media > Film (0.34)
Leisure & Entertainment (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

A Omitted Proofs

Neural Information Processing SystemsMay-31-2025, 01:51:52 GMT

In this section we include all the proofs deferred from the main body. For the convenience of the reader, all claims will be restated. Before we proceed with the proof of Theorem 2.3, let us point out some additional notational conventions. Thus, the theorem follows directly from (6). Suppose that both players employ (OMD) with learning rate η > 0. (t 1) (t 1) (t 1) (t 1) (t 1) (t 1) (t 1) (t 1) Consider a bimatrix game (A, B), and suppose that both players employ (OMD) with learning rate η > 0 and a G-smooth regularizer.

artificial intelligence, inequality, machine learning, (18 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.70)

Add feedback

Optimistic Mirror Descent Either Converges to Nash or to Strong Coarse Correlated Equilibria in Bimatrix Games

Neural Information Processing SystemsMay-31-2025, 01:51:48 GMT

As an immediate consequence, when the iterates of OMD are bounded away from being Nash equilibria in a bimatrix game, we guarantee convergence to an exact CCE after only O(1) iterations. Our results reveal that uncoupled no-regret learning algorithms can converge to CCE in general-sum games remarkably faster than to NE in, for example, zero-sum games. To establish this, we show that when OMD does not reach arbitrarily close to a NE, the (cumulative) regret of both players is not only negative, but decays linearly with time. Given that regret is the canonical measure of performance in online learning, our results suggest that cycling behavior of no-regret learning algorithms in games can be justified in terms of efficiency.

artificial intelligence, equilibrium, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > California > Orange County > Irvine (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Add feedback

Transformers Represent Belief State Geometry in their Residual Stream Adam S. Shai Sarah E. Marzen Lucas Teixeira Simplex Department of Natural Sciences PIBBSS

Neural Information Processing SystemsMay-31-2025, 01:29:46 GMT

What computational structure are we building into large language models when we train them on next-token prediction? Here, we present evidence that this structure is given by the meta-dynamics of belief updating over hidden states of the datagenerating process. Leveraging the theory of optimal prediction, we anticipate and then find that belief states are linearly represented in the residual stream of transformers, even in cases where the predicted belief state geometry has highly nontrivial fractal structure. We investigate cases where the belief state geometry is represented in the final residual stream or distributed across the residual streams of multiple layers, providing a framework to explain these observations. Furthermore we demonstrate that the inferred belief states contain information about the entire future, beyond the local next-token prediction that the transformers are explicitly trained on. Our work provides a general framework connecting the structure of training data to the geometric structure of activations inside transformers.

artificial intelligence, belief revision, machine learning, (18 more...)

Neural Information Processing Systems

Country: